Overview

Dataset info

Number of variables15
Number of observations1274900
Missing cells0 (0.0%)
Duplicate rows154 (< 0.1%)
Total size in memory494.5 MiB
Average record size in memory406.7 B

Variables types

NUM8
CAT5
BOOL1
DATE1

Reproduction info

Date of analysis2020-02-18 07:34:18.956449
Versionpandas-profiling v2.4.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download Configurationconfig.yaml

Warnings

Dataset has 154 (< 0.1%) duplicate rows Warning
Dport has 118774 (9.3%) zeros Zeros
DstAddr has a high cardinality: 634 distinct values Warning
dTos has 879713 (69.0%) zeros Zeros
Dur has 19550 (1.5%) zeros Zeros
SrcAddr has a high cardinality: 225 distinct values Warning
SrcBytes is highly skewed (γ1 = 33.72806189) Skewed
SrcBytes has 17444 (1.4%) zeros Zeros
State has a high cardinality: 58 distinct values Warning
sTos has 1251404 (98.2%) zeros Zeros
TotBytes is highly skewed (γ1 = 360.8529499) Skewed
TotPkts is highly skewed (γ1 = 102.1367655) Skewed
State is highly correlated with DirHigh Correlation
Dir is highly correlated with StateHigh Correlation

Variables

Dir
Categorical

HIGH CORRELATION
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
->
1192811
<?>
 
58084
?>
 
12357
<->
 
11641
<-
 
7
ValueCountFrequency (%) 
-> 1192811 93.6%
 
<?> 58084 4.6%
 
?> 12357 1.0%
 
<-> 11641 0.9%
 
<- 7 < 0.1%
 

Composition

Contains charsFalse
Contains digitsFalse
Contains whitespaceTrue
Contains non-wordsTrue

Length

Max length5
Mean length4.999994509
Min length4
Scatter

Dport
Real number (ℝ≥0)

ZEROS
Distinct count15572
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5767.715885167464
Minimum0
Maximum65534
Zeros118774
Zeros (%)9.3%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q1443
median443
Q38080
95-th percentile50641
Maximum65534
Range65534
Interquartile range (IQR)7637

Descriptive statistics

Standard deviation12904.24816
Coefficient of variation (CV)2.237323824
Kurtosis11.4478629
Mean5767.715885
Median Absolute Deviation (MAD)6974.416568
Skewness3.503504887
Sum7353260982
Variance166519620.5
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[0.00000e+00 2.00000e+00 1.45000e+01 3.90000e+01 6.00000e+01 ... 5.76205e+04 5.76215e+04 5.83385e+04 6.54485e+04 6.55340e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
443 633473 49.7%
 
8080 358866 28.1%
 
0 118774 9.3%
 
1900 56367 4.4%
 
449 8964 0.7%
 
53 7384 0.6%
 
547 5590 0.4%
 
5355 4860 0.4%
 
546 2350 0.2%
 
445 2262 0.2%
 
Other values (15562) 76010 6.0%
 
ValueCountFrequency (%) 
0 118774 9.3%
 
4 437 < 0.1%
 
25 52 < 0.1%
 
53 7384 0.6%
 
67 158 < 0.1%
 
ValueCountFrequency (%) 
65534 3 < 0.1%
 
65533 4 < 0.1%
 
65532 4 < 0.1%
 
65531 6 < 0.1%
 
65530 3 < 0.1%
 

DstAddr
Categorical

HIGH CARDINALITY
Distinct count634
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
239.255.255.250
 
56385
36.66.107.162
 
28050
103.4.18.170
 
20800
87.106.77.193
 
19959
45.79.186.178
 
19379
Other values (629)
1130327
ValueCountFrequency (%) 
239.255.255.250 56385 4.4%
 
36.66.107.162 28050 2.2%
 
103.4.18.170 20800 1.6%
 
87.106.77.193 19959 1.6%
 
45.79.186.178 19379 1.5%
 
46.163.78.94 19379 1.5%
 
82.165.142.107 19379 1.5%
 
81.88.24.211 19379 1.5%
 
62.75.145.252 19378 1.5%
 
178.79.172.45 19376 1.5%
 
Other values (624) 1033436 81.1%
 

Composition

Contains charsTrue
Contains digitsTrue
Contains whitespaceFalse
Contains non-wordsTrue

Length

Max length35
Mean length13.23139305
Min length7
Scatter

dTos
Real number (ℝ)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9461345987920622
Minimum-1.0
Maximum164.0
Zeros879713
Zeros (%)69.0%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum-1
5-th percentile-1
Q1-1
median0
Q30
95-th percentile0
Maximum164
Range165
Interquartile range (IQR)1

Descriptive statistics

Standard deviation18.44552088
Coefficient of variation (CV)9.478029369
Kurtosis70.71414942
Mean1.946134599
Median Absolute Deviation (MAD)4.414838723
Skewness8.458025833
Sum2481127
Variance340.2372407
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[ -1. -0.5 4. 20. 36. 56. 76. 126. 164. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 879713 69.0%
 
-1 374117 29.3%
 
164 15685 1.2%
 
40 3136 0.2%
 
72 1901 0.1%
 
32 171 < 0.1%
 
88 147 < 0.1%
 
80 27 < 0.1%
 
8 3 < 0.1%
 
ValueCountFrequency (%) 
-1 374117 29.3%
 
0 879713 69.0%
 
8 3 < 0.1%
 
32 171 < 0.1%
 
40 3136 0.2%
 
ValueCountFrequency (%) 
164 15685 1.2%
 
88 147 < 0.1%
 
80 27 < 0.1%
 
72 1901 0.1%
 
40 3136 0.2%
 

Dur
Real number (ℝ≥0)

ZEROS
Distinct count517021
Unique (%)40.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean278.54504866455954
Minimum0.0
Maximum3600.0
Zeros19550
Zeros (%)1.5%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.000176
Q10.420203
median8.998371
Q376.301514
95-th percentile3505.510742
Maximum3600
Range3600
Interquartile range (IQR)75.881311

Descriptive statistics

Standard deviation879.6331257
Coefficient of variation (CV)3.157956424
Kurtosis9.337380911
Mean278.5450487
Median Absolute Deviation (MAD)452.6495024
Skewness3.347057964
Sum355117082.5
Variance773754.4359
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[0.00000000e+00 2.00000000e-06 2.85000000e-05 6.15000000e-05 1.03500000e-04 ... 3.59999524e+03 3.59999670e+03 3.59999915e+03 3.59999988e+03 3.60000000e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 19550 1.5%
 
0.000158 1092 0.1%
 
0.000155 1051 0.1%
 
0.000149 1048 0.1%
 
0.000159 1047 0.1%
 
0.00015 1045 0.1%
 
0.000153 1041 0.1%
 
0.000148 1038 0.1%
 
0.000156 1026 0.1%
 
0.000147 1024 0.1%
 
Other values (517011) 1245938 97.7%
 
ValueCountFrequency (%) 
0 19550 1.5%
 
4e-06 4 < 0.1%
 
5e-06 4 < 0.1%
 
6e-06 3 < 0.1%
 
7e-06 6 < 0.1%
 
ValueCountFrequency (%) 
3600 49 < 0.1%
 
3599.999756 48 < 0.1%
 
3599.999512 46 < 0.1%
 
3599.999268 23 < 0.1%
 
3599.999023 15 < 0.1%
 

Label
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
1
995086
0
279814
ValueCountFrequency (%) 
1 995086 78.1%
 
0 279814 21.9%
 

Proto
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
tcp
1076641
ipv6-icmp
 
118979
udp
 
79032
llc
 
190
46846
 
24
Other values (6)
 
34
ValueCountFrequency (%) 
tcp 1076641 84.4%
 
ipv6-icmp 118979 9.3%
 
udp 79032 6.2%
 
llc 190 < 0.1%
 
46846 24 < 0.1%
 
rtcp 14 < 0.1%
 
47413 12 < 0.1%
 
rtp 2 < 0.1%
 
24533 2 < 0.1%
 
30718 2 < 0.1%
 

Composition

Contains charsTrue
Contains digitsTrue
Contains whitespaceFalse
Contains non-wordsTrue

Length

Max length9
Mean length3.560021963
Min length3
Scatter

Sport
Real number (ℝ≥0)

Distinct count18105
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48384.84700054906
Minimum0
Maximum65534
Zeros47
Zeros (%)< 0.1%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum0
5-th percentile130
Q150643
median55062
Q359876
95-th percentile64325
Maximum65534
Range65534
Interquartile range (IQR)9233

Descriptive statistics

Standard deviation19961.10736
Coefficient of variation (CV)0.4125487336
Kurtosis1.362940059
Mean48384.847
Median Absolute Deviation (MAD)14139.89247
Skewness-1.725002047
Sum6.168584144e+10
Variance398445806.9
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 2.70000e+01 6.75000e+01 9.90000e+01 ... 6.28505e+04 6.42665e+04 6.54425e+04 6.55335e+04 6.55340e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
130 85351 6.7%
 
135 31217 2.4%
 
10125 10952 0.9%
 
10126 10503 0.8%
 
10121 9157 0.7%
 
10120 9153 0.7%
 
10118 9152 0.7%
 
10117 9151 0.7%
 
8080 8098 0.6%
 
546 5590 0.4%
 
Other values (18095) 1086576 85.2%
 
ValueCountFrequency (%) 
0 47 < 0.1%
 
1 622 < 0.1%
 
53 19 < 0.1%
 
67 11 < 0.1%
 
68 158 < 0.1%
 
ValueCountFrequency (%) 
65534 52 < 0.1%
 
65533 49 < 0.1%
 
65532 49 < 0.1%
 
65531 48 < 0.1%
 
65530 47 < 0.1%
 

SrcAddr
Categorical

HIGH CARDINALITY
Distinct count225
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
10.0.2.102
161551
192.168.1.115
 
133465
192.168.1.122
 
114129
192.168.1.124
 
82916
192.168.1.121
 
80099
Other values (220)
702740
ValueCountFrequency (%) 
10.0.2.102 161551 12.7%
 
192.168.1.115 133465 10.5%
 
192.168.1.122 114129 9.0%
 
192.168.1.124 82916 6.5%
 
192.168.1.121 80099 6.3%
 
192.168.1.127 77092 6.0%
 
192.168.1.125 70581 5.5%
 
192.168.1.2 58068 4.6%
 
192.168.1.118 44508 3.5%
 
192.168.1.113 44244 3.5%
 
Other values (215) 408247 32.0%
 

Composition

Contains charsTrue
Contains digitsTrue
Contains whitespaceFalse
Contains non-wordsTrue

Length

Max length35
Mean length13.68688917
Min length2
Scatter

SrcBytes
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count2852
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean777.1835893011216
Minimum0
Maximum214850
Zeros17444
Zeros (%)1.4%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum0
5-th percentile172
Q1194
median864
Q3977
95-th percentile2494
Maximum214850
Range214850
Interquartile range (IQR)783

Descriptive statistics

Standard deviation1059.434673
Coefficient of variation (CV)1.363171698
Kurtosis3757.499125
Mean777.1835893
Median Absolute Deviation (MAD)498.6688795
Skewness33.72806189
Sum990831358
Variance1122401.827
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[0.00000e+00 2.70000e+01 5.70000e+01 6.10000e+01 6.30000e+01 ... 4.07510e+04 4.93440e+04 5.56800e+04 8.84185e+04 2.14850e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
194 225869 17.7%
 
318 121148 9.5%
 
956 109284 8.6%
 
172 68635 5.4%
 
864 49613 3.9%
 
954 46063 3.6%
 
1063 39771 3.1%
 
1015 38813 3.0%
 
1068 30834 2.4%
 
314 30473 2.4%
 
Other values (2842) 514397 40.3%
 
ValueCountFrequency (%) 
0 17444 1.4%
 
54 6 < 0.1%
 
60 160 < 0.1%
 
62 23 < 0.1%
 
64 2 < 0.1%
 
ValueCountFrequency (%) 
214850 2 < 0.1%
 
136894 1 < 0.1%
 
135080 1 < 0.1%
 
91394 1 < 0.1%
 
85443 1 < 0.1%
 
Distinct count1274020
Unique (%)99.9%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
Minimum1970-01-01 01:00:00
Maximum1970-03-05 01:19:27.214865
Mini histogram
Histogram
Histogram

State
Categorical

HIGH CARDINALITY
HIGH CORRELATION
Distinct count58
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.7 MiB
FSPA_FSPA
184710
S_
175128
SRPA_SA
152375
SRPA_SPA
 
106300
FSPA_FSRPA
 
95665
Other values (53)
560722
ValueCountFrequency (%) 
FSPA_FSPA 184710 14.5%
 
S_ 175128 13.7%
 
SRPA_SA 152375 12.0%
 
SRPA_SPA 106300 8.3%
 
FSPA_FSRPA 95665 7.5%
 
MRQ 85351 6.7%
 
INT 70411 5.5%
 
FSRPA_SA 58378 4.6%
 
PA_R 56624 4.4%
 
S_RA 53613 4.2%
 
Other values (48) 236345 18.5%
 

Composition

Contains charsTrue
Contains digitsFalse
Contains whitespaceFalse
Contains non-wordsFalse

Length

Max length11
Mean length6.26635893
Min length2
Scatter

sTos
Real number (ℝ)

ZEROS
Distinct count150
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6381465212957879
Minimum-1.0
Maximum255.0
Zeros1251404
Zeros (%)98.2%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum255
Range256
Interquartile range (IQR)0

Descriptive statistics

Standard deviation10.21568738
Coefficient of variation (CV)16.00837275
Kurtosis265.8055822
Mean0.6381465213
Median Absolute Deviation (MAD)1.298195917
Skewness16.20275946
Sum813573
Variance104.3602687
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[ -1. -0.5 0.5 8.5 14.5 ... 238.5 239.5 247.5 248.5 255. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1251404 98.2%
 
-1 17676 1.4%
 
164 4266 0.3%
 
40 717 0.1%
 
16 44 < 0.1%
 
225 36 < 0.1%
 
44 31 < 0.1%
 
23 28 < 0.1%
 
24 26 < 0.1%
 
239 24 < 0.1%
 
Other values (140) 648 0.1%
 
ValueCountFrequency (%) 
-1 17676 1.4%
 
0 1251404 98.2%
 
1 2 < 0.1%
 
8 2 < 0.1%
 
9 2 < 0.1%
 
ValueCountFrequency (%) 
255 2 < 0.1%
 
253 2 < 0.1%
 
251 2 < 0.1%
 
249 2 < 0.1%
 
248 19 < 0.1%
 

TotBytes
Real number (ℝ≥0)

SKEWED
Distinct count5270
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1978.9317099380344
Minimum54
Maximum8036184
Zeros0
Zeros (%)0.0%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum54
5-th percentile194
Q1374
median1129
Q33010
95-th percentile4323
Maximum8036184
Range8036130
Interquartile range (IQR)2636

Descriptive statistics

Standard deviation10715.74987
Coefficient of variation (CV)5.414916449
Kurtosis251406.1358
Mean1978.93171
Median Absolute Deviation (MAD)1696.971828
Skewness360.8529499
Sum2522940037
Variance114827295.2
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[5.400000e+01 5.700000e+01 6.100000e+01 6.300000e+01 6.500000e+01 ... 5.773820e+05 5.788320e+05 5.845685e+05 6.756830e+05 8.036184e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
194 172265 13.5%
 
438 121144 9.5%
 
226 56622 4.4%
 
374 53381 4.2%
 
864 49530 3.9%
 
1717 49238 3.9%
 
1130 40380 3.2%
 
3981 38557 3.0%
 
430 31808 2.5%
 
1720 27587 2.2%
 
Other values (5260) 634388 49.8%
 
ValueCountFrequency (%) 
54 1 < 0.1%
 
60 160 < 0.1%
 
62 23 < 0.1%
 
64 2 < 0.1%
 
66 2852 0.2%
 
ValueCountFrequency (%) 
8036184 1 < 0.1%
 
2580077 1 < 0.1%
 
893155 1 < 0.1%
 
682079 1 < 0.1%
 
669287 1 < 0.1%
 

TotPkts
Real number (ℝ≥0)

SKEWED
Distinct count318
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.78514393285748
Minimum1
Maximum6055
Zeros0
Zeros (%)0.0%
Memory size9.7 MiB
Mini histogram

Quantile statistics

Minimum1
5-th percentile2
Q14
median9
Q317
95-th percentile29
Maximum6055
Range6054
Interquartile range (IQR)13

Descriptive statistics

Standard deviation12.68916521
Coefficient of variation (CV)1.17654111
Kurtosis41821.7867
Mean10.78514393
Median Absolute Deviation (MAD)6.688930975
Skewness102.1367655
Sum13749980
Variance161.0149138
Histogram
Histogram with fixed size bins (bins=10)
Histogram
Histogram with variable size bins (bins=[1.000e+00 1.500e+00 2.500e+00 3.500e+00 4.500e+00 ... 5.295e+02 5.475e+02 5.910e+02 8.035e+02 6.055e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6 211299 16.6%
 
3 189402 14.9%
 
10 169325 13.3%
 
18 119852 9.4%
 
2 84014 6.6%
 
9 72303 5.7%
 
19 66530 5.2%
 
8 61206 4.8%
 
17 59765 4.7%
 
4 57600 4.5%
 
Other values (308) 183604 14.4%
 
ValueCountFrequency (%) 
1 19597 1.5%
 
2 84014 6.6%
 
3 189402 14.9%
 
4 57600 4.5%
 
5 3221 0.3%
 
ValueCountFrequency (%) 
6055 1 < 0.1%
 
2279 1 < 0.1%
 
1479 1 < 0.1%
 
1452 1 < 0.1%
 
825 1 < 0.1%
 

Correlations

Missing values

Sample

First rows

DirDportDstAddrdTosDurLabelProtoSportSrcAddrSrcBytesStartTimeStatesTosTotBytesTotPkts
0->000:00:00:00:00:00-1.00.0000000llc000:00:00:00:00:00601970-01-01 01:00:00.000000INT-1.0601
1->547ff02::1:2-1.03479.9926760udp546fe80::6dd3:1409:3456:856291981970-01-01 01:00:06.524077INT0.0919863
2->0ff02::1:ff56:8562-1.00.0000000ipv6-icmp135::781970-01-01 01:00:06.653993NNS0.0781
3->0ff02::2-1.08.0013930ipv6-icmp133fe80::6dd3:1409:3456:85622101970-01-01 01:00:06.654046NRS0.02103
4->0ff02::16-1.00.5002160ipv6-icmp143fe80::6dd3:1409:3456:85621801970-01-01 01:00:06.654282MHR0.01802
5<->538.8.8.80.04.0032341udp6007110.0.2.1021521970-01-01 01:00:12.595887CON0.02443
6->534.4.4.4-1.03.0042451udp6007110.0.2.1022281970-01-01 01:00:13.593757INT0.02283
7<->538.8.8.80.00.0012241udp5504010.0.2.102761970-01-01 01:00:16.599814CON0.01802
8->5355ff02::1:3-1.00.0975910udp56744fe80::6dd3:1409:3456:85621681970-01-01 01:02:32.956442INT0.01682
9->5355224.0.0.252-1.00.0976881udp6475210.0.2.1021281970-01-01 01:02:32.956736INT0.01282

Last rows

DirDportDstAddrdTosDurLabelProtoSportSrcAddrSrcBytesStartTimeStatesTosTotBytesTotPkts
1274890->8080202.44.54.40.0901.6003420tcp4980910.0.2.1027551970-01-07 10:19:35.747418RST0.0134711
1274891->8080202.44.54.40.0901.6895750tcp4981010.0.2.1027551970-01-07 10:34:37.348084RST0.0134711
1274892->547ff02::1:2-1.03479.4606930udp546fe80::6dd3:1409:3456:856291981970-01-07 10:43:59.248623REQ0.0919863
1274893->8080202.44.54.40.0901.4426880tcp4981110.0.2.1027551970-01-07 10:49:39.037949RST0.0134711
1274894->8080202.44.54.40.0901.4596560tcp4981210.0.2.1027551970-01-07 11:04:40.480971RST0.0134711
1274895->8080202.44.54.40.0901.0066530tcp4981310.0.2.1027551970-01-07 11:19:41.941002RST0.0134711
1274896->8080202.44.54.40.0901.5784910tcp4981410.0.2.1027551970-01-07 11:34:42.947968RST0.0134711
1274897->547ff02::1:2-1.01350.5471190udp546fe80::6dd3:1409:3456:856240881970-01-07 11:48:02.717418REQ0.0408828
1274898->8080202.44.54.40.0905.2472530tcp4981510.0.2.1027551970-01-07 11:49:44.526804RST0.0134711
1274899->8080202.44.54.40.017.1690390tcp4981610.0.2.1027011970-01-07 12:04:49.751908FIN0.012399